Word extraction using irregular pyramid
نویسندگان
چکیده
This paper proposed a new algorithm to perform text extraction from imaged documents. The paper focused in the extraction of word group. Irregular pyramid structure is used as the basis of the algorithm. The uniqueness of this algorithm is its inclusion of strategic background information in the analysis where most techniques have discarded. Both foreground (i.e. text area) and portion of background (i.e. white area) regions are examined. The fundamental of the algorithm is based on the concept of “closeness” where text information within a group is closed to each other, in terms of spatial distance, as compared to other text area. The result produced by the algorithm is encouraging with the ability to correctly group words of different size, font, arrangement and orientation.
منابع مشابه
Word and Sentence Extraction Using Irregular Pyramid
This paper presents the result of our continued work on a further enhancement to our previous proposed algorithm. Moving beyond the extraction of word groups and based on the same irregular pyramid structure the new proposed algorithm groups the extracted words into sentences. The uniqueness of the algorithm is in its ability to process text of a wide variation in terms of size, font, orientati...
متن کاملDetection of Word Groups Based on Irregular Pyramid
This paper proposes a new algorithm to detect word groups in imaged documents, using irregular pyramid. The uniqueness of this algorithm is its inclusion of strategic background information in the analysis where most techniques have discarded. Both foreground (i.e. text area) and portion of background (i.e. white area) regions are examined. The fundamental of the algorithm is based on the conce...
متن کاملAdaptive Region Growing Color Segmentation for Text Using Irregular Pyramid
This paper presents the result of an adaptive region growing segmentation technique for color document images using an irregular pyramid structure. The emphasis is in the segmentation of textual components for subsequence extraction in document analysis. The segmentation is done in the RGB color space. A simple color distance measurement and a category of color thresholds are derived. The propo...
متن کاملUsing Irregular Pyramid for Text Segmentation and Binarization of Gray Scale Image
Compared to binary images that most text extraction methods work on, gray scale images provides much more information for the extraction task. On the other hand complication also arises in determining the subject textual content from its background region (ie. thresholding) before the actual text extraction process can begin. Differing from the usual sequence of processes where document images ...
متن کاملUsing Irregular Pyramid for Text Segmentation and Binarization of Gray Scale Images
Compared to binary images that most text extraction methods work on, gray scale images provide much more information for the extraction task. On the other hand complication also arises in determining the subject textual content from its background region (ie. thresholding) before the actual text extraction process can begin. Differing from the usual sequence of processes where document images a...
متن کامل